Key interaction networks: Identifying evolutionarily conserved non‐covalent interaction networks across protein families

Abstract Protein structure (and thus function) is dictated by non‐covalent interaction networks. These can be highly evolutionarily conserved across protein families, the members of which can diverge in sequence and evolutionary history. Here we present KIN, a tool to identify and analyze conserved non‐covalent interaction networks across evolutionarily related groups of proteins. KIN is available for download under a GNU General Public License, version 2, from https://www.github.com/kamerlinlab/KIN. KIN can operate on experimentally determined structures, predicted structures, or molecular dynamics trajectories, providing insight into both conserved and missing interactions across evolutionarily related proteins. This provides useful insight both into protein evolution, as well as a tool that can be exploited for protein engineering efforts. As a showcase system, we demonstrate applications of this tool to understanding the evolutionary‐relevant conserved interaction networks across the class A β‐lactamases.


Structure Selection and Preparation
The Protein Data Bank (PDB, (1)) IDs for all Class A structures were obtained from the Betalactamase database (BLDB) (2) on the 5 th of June 2023.This database was then filtered to remove β-lactamases that were synthetic, chimeric or generated from ancestral sequence reconstruction to focus our study on only naturally occurring β-lactamases.In the case of β-lactamases for which more than one structure was available, the structure with the best resolution was selected for download.This resulted in 69 unique class A structures being downloaded from the PDB.
The downloaded structures went through a 5-step procedure in order to prepare them for both crystal structure analysis and molecular dynamics (MD) simulations.This protocol is provided as , where relevant, we took the first monomeric unit from each crystal structure for our simulations and analysis.This also ensured an equal weighting of each sequence in the subsequent multiple sequence alignment (MSA).( 2) MolProbity ( 6) was then used to determine the optimum tautomerization states for all histidine residues and make any required Asn/Gln side-chain flips under the criteria of optimizing the hydrogen bonding network.(3) The pKa of all residues with more than one possible protonation state was predicted using PROPKA v3.1 (7).If an Asp, Glu or His residue had a predicted pKa ≥ 8 or if an Arg, Lys or Tyr had a predicted pKa ≤ 6, they were flagged for visual inspection, and the protonation state was assigned based on examining the local hydrogen bonding network of the residue.(4) The S4 module "pdb4amber" was run on each structure in order to make each PDB file follow AmberMD formatting.( 5) Tleap (8) was used to generate an Amber compatible topology and coordinate file alongside solvate and neutralize each system (for MD simulations).Each structure was described with the Amber ff14SB (9) force field and TIP3P (10) water model.While this workflow is largely automated, some visual examination after step (2) and ( 3) is required to ensure that the procedure is appropriate for the system of choice.The final structures used for contact analysis are provided in the GitHub repository at https://www.github.com/kamerlinlab/KIN.

Structures Selected
The

Molecular Dynamics Simulations
Molecular dynamics (MD) simulations were performed using Amber20 (8) on all 69 of the βlactamase structures prepared above.To prepare each system for production MD simulations, a standard procedure of minimization, heating and equilibration was followed.

S5
For all dynamics steps, a 1 fs timestep was used alongside the SHAKE algorithm (11) to constrain all bonds involving a hydrogen atom.All NVT simulations used Langevin temperature control (collision frequency of 1 ps −1 ), whilst all NPT simulations used both Langevin temperature control (collision frequency of 1 ps −1 ) and a Berendsen barostat (( 12), 1 ps pressure relaxation time).In order to generate 5 replicas for each system, random velocities were assigned at the heating step (described below) for each system.
The steps were as follows: (1) 1000 steps of steepest descent energy minimization were performed to relax each structure.(2) The system was gradually heated from a starting temperature of 100 K to a final temperature of 300 K over the course of 1 ns.During this heating step, all protein atoms were restrained using 100 kcal mol -1 Å -2 distance restraints.(3) A 1 ns long NVT simulation was performed with this restraint retained.(4) A 1 ns long NPT simulation was performed but this time with the restraint reduced to 10 kcal mol -1 Å -2 .( 5) The restraint was then altered to be on only the backbone heavy atoms of each protein residue and another 1 ns long NPT simulation was run.(6) The restraint magnitude was then reduced to 1 kcal mol -1 Å -2 for a 1 ns long NPT simulation.(7) The restraint magnitude was then reduced to 0.1 kcal mol -1 Å -2 for a 1 ns long NPT simulation.(8) Finally, a 1 ns long MD simulation without restraints was performed as the final equilibration step.
For each system, five, 100 ns long replicas of NPT (1 atm and 300 K) production MD simulations were run.Simulations used a timestep of 2 fs with the SHAKE algorithm (11) applied to constrain all bonds to hydrogen.A 10 Å direct space nonbonded cut-off was applied with longrange electrostatics evaluated using the particle mesh Ewald algorithm (13).Temperature and pressure regulation were achieved using Langevin temperature control (collision frequency used was 1 ps −1 ) and a Berendsen barostat (pressure relaxation time used was 1 ps).Frames were saved S6 every 100 ps and used as input for contacts calculation.To prepare, for the contact analysis, CPPTRAJ ( 14) was used to post-process each trajectory (performing both imaging and removing solvent and counter ions).

Contact Analysis
In order to identify the non-covalent protein interactions present in each frame, we combined interaction definitions present in literature within a python module (included as part of the GitHub repository for this package: https://www.github.com/kamerlinlab/KIN)(15; 16).This approach can identify the following interaction types between the 20 standard amino acids: salt bridge; hydrogen bond; cation-π; π-π interaction; hydrophobic and van der Waals (vdWs) interactions (see Figure S11) and further specifies if the interaction is from the side or main chain or both parts of the chain.A salt bridge is defined as such if 2 oppositely charged heavy atoms are within 4.5 Å of one another.A hydrogen bond is defined as such if the donor-acceptor distance is ≤ 3.5 Å and the donor-hydrogen-acceptor angle is 180 ± 45°.Definitions for cation-π interactions were taken from Infield et al. (16), where a cationic residue is either Arg or Lys and the π-residue is one of Trp, Tyr, Phe or a neutral His.For Lys, there are two requirements: First, the center of mass of the ring and the Lys sidechain nitrogen must be within 6 Å of one another.Second, the angle between two vectors must be 0 ± 30°.These vectors are (1) the normal vector of the aromatic ring and (2) the vector between the Lys sidechain nitrogen and the aromatic rings center of mass.For Arg the above two requirements need to be met (instead of using the Lys sidechain nitrogen the center of mass of the guanidium group is used instead) and an additional requirement needs to be met.This third requirement is an angle measurement whereby the angle must be either 0° ± 30° (stacked cation-π interaction) or 90° ± 30° (t-shaped cation-π interaction).The angle is between the normal vector S7 of the guanidium group and the normal vector of the aromatic ring.Definitions for π-π interactions were taken from Zhao et al. (15), where a π residue is one of Trp, Tyr, Phe or a neutral His.The following requirements need to be met for a π-π interaction.First, the distance between the centres of mass of the two aromatic rings had to be ≤ 7.2 Å.Two angles were also determined to classify if the interaction was a π-π interaction.The first angle, θ, was calculated as the angle between the residue 1's vector and the π-π -90°.The second angle named delta was the angle between the residue 2's vector and the π-π -90°.If both theta and delta were between 0° ± 30° then the interaction would have no π-overlap and would not be counted.Otherwise, the interaction was labelled as a π-π interaction.A hydrophobic interaction is defined if two hydrophobic residues (Ala, Val, Leu, Ile, Pro, Trp, Phe, Cys, Met) have a non-polar carbon atom within 4 Å of one another.A van der Waals (VDW) interaction is defined as between any two atoms within 4 Å of one another.The program is written so that if an interaction more specific than an VDW interaction is found, only the more specific interaction is shown.Likewise, a salt bridge interaction will not also be defined as a hydrogen bonding interaction even if the criteria for a hydrogen bond has been met.
Interactions were defined at the sidechain/mainchain level between a pair of residues, meaning multiple interactions could be identified between a single pair of residues.To analyze the interactions for both structures and simulations, we developed a python library of definitions for each interaction type to scan against.This library made use of MDAnalysis (17; 18) and is available for download and use as part of the GitHub repository associated with this project: https://www.github.com/kamerlinlab/KIN.

Protein Sequence Analysis
Multiple sequence alignment (MSA) was performed using Modeller v.10.3 (19).Parameters for this procedure were following the Modeller manual.Residue type was used as the alignment feature and maximum gap length was set to 20.The percentage identity matrix was generated using the Clustal Omega webserver (20), providing the previously described multiple sequence alignment.

Predicting Candidate Mutations with KIN
KIN was used to identify candidate single-or double-point mutations with the β-lactamase enzyme TEM1 used as the model system.TEM1 was chosen due the existence of deep mutational scanning data, in which the fitness value of almost all single point mutations in this enzyme was determined (21).The normalized (against wild-type TEM1) fitness values for each mutation were taken from the Supporting Information of ref. (21) without modification.Potential mutations of TEM1 were found by identifying alternative residue combinations in other β-lactamases that form a contact using a pair of residues present in TEM1.Due to the experimental data available, only single point mutations were used in the comparison.To perform filtering based on the distance from the active site, the minimum heavy atom distance between every residue in the crystal structure of TEM1 to the catalytic residues in the active site which are S70 and E166, using E. coli numbering (PDB ID:1BTL ( 22)).For each contact, the residue with the smallest distance to the active site was used to define its distance from the active site.For reference, the number of interactions obtained when performing analysis using only the crystallographic data (labelled "crystal") is also shown.The first violin plot is obtained using our approach with just the X-ray structures.The 3 subsequent approaches utilize the MD simulation data with different cutoffs applied to define if a contact is considered present.For example, a 10% cutoff would mean that a contact has to be present in at least 10% of frames for it to be included.indicated by a dotted line on the violin plot.In all cases, the mutations selected here come from crystal structure analysis.The distance of each mutation to the active site is based on the minimum heavy atom distance of each contact to the two catalytic residues: S70 and E166 (E. coli numbering, based on PDB ID:1BTL ( 22)) that make up the active site.We note that at a 25 Å distance cut-off, all residues are included in our analysis.

Figure S2 .
Figure S1.Histogram of the conservation scores removed when changing the MD cutoff filter

Figure S10 .
Figure S10.Comparison of the normalized fitness values of mutants selected by KIN using